Overview

Column

We looked over the daily daily average PM2.5 and created some graphs to go along with the data.

Here is a glimpse at the data in the dataset.

Column

Data

Bar Plot

Column

Here is a Bar Plot of the PM 2.5 data.

Analysis

Besides the outliers, the graph is decently symmetric. There are more and further outliers to the right.

Column

Bar Plot

Air Quality

Column

Note that although the current national ambient air quality standard is 12 micrograms per cubic meter, it used to be 15.

Analysis

Fresno County, Kern County, Kings County, Los Angeles County, Merced County, Riverside County, Stanislaus County, and Tulare County are all counties that exceeds the air quality standard of 15 micrograms per cubic meter. These counties are all in California.

Column

Data

Box/Violin Plots

Column

Here are two side-by-side plots to explore the difference in PM2.5 levels between eastern and western U.S.

Analysis

There are more values close to the median from the east. There are more outliers from the west. The west clumps more towards the lower end while the east clumps towards the higher end.

Column

Box Plots

Violin Plots

Histograms

Column

Analysis

There are values above the past maximum of 15.

There are more values close to the median from the east. There are more outliers from the west. The west clumps more towards the lower end while the east clumps towards the higher end.

Column

Histogram with Cutoff

Here is a histogram of the PM2.5 data. The vertical line on the histogram shows the cutoff value.

Histogram by Region

Here is a histogram of the PM2.5 data split into two histograms by regions.

Scatterplots

Column

Analysis

The eastern data has less devation while the western data has more deviation

Column

Scatterplot1

Scatterpot2

Correlogram

Column

Analysis

There is high correlation between longitude and pm25, but low correlation between latitude and pm25 indicated by the size od the pie and the shade of the color.

Column

Correlogram

---
title: "Midterm"
author: "Jamie Zhang"
output: 
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswatch: sketchy
      navbar-bg: "#42033D"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(tidyverse)
library(plotly)
library(DT)
```

Overview
===
Column {data-width=350}
---
We looked over the daily <span Style="color:#854798">daily average PM2.5</span> and created some graphs to go along with the data.

Here is a glimpse at the data in the dataset.

Column {data-width=550}
---
### Data

```{r}
pm<-read_csv("avgpm25.csv")
datatable(pm[1:500,],rownames = F, colnames = c("PM2.5","Fips","Region","Longitude","Latitude"), options = list(pageLength = 20))
```

Bar Plot
===

Column {data-width=350}
---
Here is a <span Style="color:#854798">Bar Plot</span> of the PM 2.5 data.

### Analysis

Besides the outliers, the graph is decently symmetric. There are more and further outliers to the right.

Column {data-width=550}
---
### Bar Plot

```{r}
ggplot(pm,aes(x=pm$pm25))+
  geom_boxplot(fill="#680E4B",color="gray10")+
  labs(title = "PM2.5",x="µg/m^3")+
  ylim(-0.6,0.6)
```

Air Quality
===

Column {data-width=350}
---
Note that although the current national ambient air quality standard is 12 micrograms per cubic meter, it used to be 15. 

### Analysis

Fresno County, Kern County, Kings County, Los Angeles County, Merced County, Riverside County, Stanislaus County, and Tulare County are all counties that exceeds the air quality standard of 15 micrograms per cubic meter. These counties are all in California.

Column {data-width=550}
---
### Data

```{r}
pm %>%
  arrange(pm$pm25) %>%
  head(8) %>%
  datatable(rownames = F,colnames = c("PM2.5","Fips","Region","Longitude","Latitude"), options = list(pageLength = 8)) 
```

Box/Violin Plots
===

Column {data-width=350}
---
Here are two side-by-side plots to explore the difference in PM2.5 levels between eastern and western U.S.

### Analysis

There are more values close to the median from the east. There are more outliers from the west. The west clumps more towards the lower end while the east clumps towards the higher end.

Column {.tabset data-width=550}
---
### Box Plots
```{r}
ggplot(pm,aes(x=pm$pm25,y=pm$region))+
  geom_boxplot(fill="#680E4B",color="gray10")+
  labs(y = "Region",x="PM2.5")
```


### Violin Plots
```{r}
ggplot(pm,aes(x=pm$pm25,y=pm$region))+
  geom_violin(fill="#680E4B",color="gray10")+
  labs(y = "Region",x="PM2.5")
```

Histograms
===

Column {data-width=350}
---

### Analysis

There are values above the past maximum of 15.

There are more values close to the median from the east. There are more outliers from the west. The west clumps more towards the lower end while the east clumps towards the higher end.

Column {.tabset data-width=550}
---
### Histogram with Cutoff

Here is a histogram of the PM2.5 data. The vertical line on the histogram shows the cutoff value.
```{r}
library(ggplot2)
ggplot(pm,aes(x=pm$pm25))+
   geom_histogram(binwidth = 1, fill="#680E4B", color="gray10" ) +
  geom_vline(xintercept = 15, color = "#854798") + 
  geom_text(aes(x = 15, y = 10, label = paste("Cutoff:", 15)), color = "#854798", vjust = -0.5, hjust = 0) +
  labs(x = "PM2.5", y = "Frequency")
```

### Histogram by Region

Here is a histogram of the PM2.5 data split into two histograms by regions.
```{r}
ggplot(pm,aes(x=pm$pm25))+
  geom_histogram(fill="#680E4B",color="gray10")+
  labs(x="PM2.5")+
  facet_wrap(~region, nrow = 2)
```

Scatterplots
===

Column {data-width=350}
---

### Analysis

The eastern data has less devation while the western data has more deviation

Column {.tabset data-width=550}
---
### Scatterplot1
```{r}
ggplot(pm,aes(x=pm$pm25,y=pm$latitude,color=pm$region))+
  geom_point()+
  labs(x="PM2.5",y="Latitude",color="Region")
```

### Scatterpot2 
```{r}
ggplot(pm,aes(x=pm$pm25,y=pm$latitude))+
  geom_point(color="#680E4B")+
  labs(x="PM2.5",y="Latitude")+
  facet_wrap(~region, ncol = 2)
```

Correlogram
===

Column {data-width=350}
---

### Analysis

There is high correlation between longitude and pm25, but low correlation between latitude and pm25 indicated by the size od the pie and the shade of the color.

Column {data-width=550}
---
### Correlogram
```{r}
library(corrgram)
select(pm,pm25,longitude,latitude)%>%
  corrgram( order=TRUE, lower.panel=panel.shade, upper.panel=panel.pie, text.panel=panel.txt) 
```